Goto

Collaborating Authors

 model operation


Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference

Zhang, Zeyu, Shen, Haiying

arXiv.org Artificial Intelligence

In large-language models, memory constraints in the key-value cache (KVC) pose a challenge during inference, especially with long prompts. In this work, we observed that compressing KV values is more effective than compressing the model regarding accuracy and job completion time (JCT). However, quantizing KV values and dropping less-important tokens incur significant runtime computational time overhead, delaying JCT. These methods also cannot reduce computation time or high network communication time overhead in sequence-parallelism (SP) frameworks for long prompts. To tackle these issues, based on our insightful observations from experimental analysis, we propose ZeroC, a Zero-delay QKV Compression system that eliminates time overhead and even reduces computation and communication time of the model operations. ZeroC innovatively embeds compression and decompression operations within model operations and adaptively determines compression ratios at a hybrid layer-token level. Further, it enables a communication-efficient SP inference framework. Trace-driven experiments demonstrate that ZeroC achieves up to 80% lower average JCT, 35% lower average perplexity, and 2.8x higher throughput with the same latency compared to state-of-the-art compression methods. ZeroC also reduces the average JCT of current LLM serving systems by up to 91% with the constraint of 0.1 perplexity increase. We open-sourced the code.


Model Operations for Secure and Reliable AI

#artificialintelligence

Artificial intelligence and Machine Learning are expressing incredible potential in various application fields; however, very few companies engaged in a 4.0 transition path can successfully implement these technologies in business processes. What needs to be done to make such applications profitable? Artificial Intelligence represents a set of studies and techniques, typical of information technology but with significant philosophical and social implications, which has as its purpose the realization of programs and technological systems capable of solving problems and carrying out tasks normally attributable to the mind and human capabilities. Given recent progress, it is possible to identify Artificial Intelligence as the discipline that deals with creating machines (hardware and software) capable of operating autonomously. The growing attention created in this discipline is motivated by the results that can be achieved thanks to the technological maturity achieved, both in the computational calculation and in the ability to analyze in real-time and in a short time of huge amounts of data in any form [Big Data Analytics].


Machine Learning Model Development and Model Operations: Principles and Practices - KDnuggets

#artificialintelligence

The use of Machine Leaning (ML) has increased substantially in enterprise data analytics scenarios to extract valuable insights from the business data. Hence, it is very important to have an ecosystem to build, test, deploy, and maintain the enterprise grade machine learning models in production environments. The ML model development involves data acquisition from multiple trusted sources, data processing to make suitable for building the model, choose algorithm to build the model, build model, compute performance metrics and choose best performing model. The model maintenance plays critical role once the model is deployed into production. The maintenance of machine learning model includes keeping the model up to date and relevant in tune with the source data changes as there is a risk of model becoming outdated in course of time.


How ModelOps Helps You Execute Your AI Strategy

#artificialintelligence

Artificial Intelligence is a hotter topic today than ever. From self-driving cars to personal assistants, AI is slowly making its way into our daily lives. Artificial Intelligence (AI) is an area of computer science that studies the possibility of thinking computers and machines. There are already many applications in place that have been developed with the help of AI, including business applications. The past decade has seen an explosion of applications for artificial intelligence, machine learning, and deep learning. This has led to advances in a wide range of application domains, including document classification and processing, natural language understanding, and bioinformatics.


Model Operations for Secure and Reliable AI

#artificialintelligence

Artificial Intelligence represents a set of studies and techniques, typical of information technology but with significant philosophical and social implications, which has as its purpose the realization of programs and technological systems capable of solving problems and carrying out tasks normally attributable to the mind and human capabilities. Given recent progress, it is possible to identify Artificial Intelligence as the discipline that deals with creating machines (hardware and software) capable of operating autonomously. The growing attention created in this discipline is motivated by the results that can be achieved thanks to the technological maturity achieved, both in the computational calculation and in the ability to analyze in real-time and in a short time of huge amounts of data in any form [Big Data Analytics]. AI is a popular branch of computer science that concerns building "intelligent" smart machines capable of performing intelligent tasks. With rapid advancements in deep learning and machine learning, the tech industry is transforming radically.


Model Operations for Secure and Reliable AI

#artificialintelligence

Artificial Intelligence represents a set of studies and techniques, typical of information technology but with significant philosophical and social implications, which has as its purpose the realization of programs and technological systems capable of solving problems and carrying out tasks normally attributable to the mind and human capabilities. Given recent progress, it is possible to identify Artificial Intelligence as the discipline that deals with creating machines (hardware and software) capable of operating autonomously. The growing attention created in this discipline is motivated by the results that can be achieved thanks to the technological maturity achieved, both in the computational calculation and in the ability to analyze in real-time and in a short time of huge amounts of data in any form [Big Data Analytics]. AI is a popular branch of computer science that concerns building "intelligent" smart machines capable of performing intelligent tasks. With rapid advancements in deep learning and machine learning, the tech industry is transforming radically.


Don't Let Tooling and Management Approaches Stifle Your AI Innovation

#artificialintelligence

It is no coincidence that companies are investing in AI at unprecedented levels at a time when they are under tremendous pressure to innovate. The artificial intelligence models developed by data scientists give enterprises new insights, enable new and more efficient ways of working, and help identify opportunities to reduce costs and introduce profitable new products and services. The possibilities for AI use grow almost daily, so it's important not to limit innovation. Unfortunately, many organizations do just that by tethering themselves to proprietary tools and solutions. This can handcuff data scientists and IT as new innovations become available, and results in higher costs than an open environment that supports best-of-breed AI model development and management.


Don't Let Tooling and Management Approaches Stifle Your AI Innovation

#artificialintelligence

It is no coincidence that companies are investing in AI at unprecedented levels at a time when they are under tremendous pressure to innovate. The artificial intelligence models developed by data scientists give enterprises new insights, enable new and more efficient ways of working, and help identify opportunities to reduce costs and introduce profitable new products and services. The possibilities for AI use grow almost daily, so it's important not to limit innovation. Unfortunately, many organizations do just that by tethering themselves to proprietary tools and solutions. This can handcuff data scientists and IT as new innovations become available, and results in higher costs than an open environment that supports best-of-breed AI model development and management.


What are model governance and model operations?

#artificialintelligence

Check out the "Model Development, Governance, Operations" sessions at the Strata Data Conference in New York, September 23-26, 2019. Best price ends June 28. Our surveys over the past couple of years have shown growing interest in machine learning (ML) among organizations from diverse industries. A few factors are contributing to this strong interest in implementing ML in products and services. First, the machine learning community has conducted groundbreaking research in many areas of interest to companies, and much of this research has been conducted out in the open via preprints and conference presentations.